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Abstract 

o 

^vq , Large scale real-world network data such as social and information networks are ubiquitous. The 

' study of such social and information networks seeks to find patterns and explain their emergence through 

Oh tractable models. In most networks, and especially in social networks, nodes have a rich set of attributes 

' (e.g., age, gender) associated with them. 

Here we present a model that we refer to as the Multiplicative Attribute Graphs (MAG), which 
naturally captures the interactions between the network structure and the node attributes. We consider a 
model where each node has a vector of categorical latent attributes associated with it. The probability of 
an edge between a pair of nodes then depends on the product of individual attribute-attribute affinities. 
, The model yields itself to mathematical analysis and we derive thresholds for the connectivity and the 

^ ' emergence of the giant connected component, and show that the model gives rise to networks with a 

O . constant diameter. We analyze the degree distribution to show that MAG model can produce networks 

with either log-normal or power-law degree distributions depending on certain conditions. 

> 

G\ ; 1 Introduction 

CO 



With the emergence of the Web, large online social computing applications have become ubiquitous, which 
in turn gave rise to a wide range of massive real-world social and information network data such as social 
networks, computer networks, Internet networks, communication networks, e-mail interactions, Web graphs, 
and so on. The unifying theme of studying real-world networks is to find patterns of connectivity and explain 
them through models. The main objective is to answer questions such as "What do real graphs look like?", 
"How do they evolve over time? "How can we synthesize realistic looking graphs?", "How can we find 
models that explain the observed patterns?", and "What are algorithmic consequences of the observations 
H ! and models?". 

Research on empirical observations about the structure of networks and the models giving rise to such 
structures go hand in hand. The empirical analysis of large real-world networks aims to discover com- 
mon structural properties or patterns, such as heavy-tailed degree distributions lil5l[TT1l . local clustering of 
edges Il43ll30l . small diameter (3j|28l, navigability Il36ll22l . emergence of community structure |[29l . and so 
on. 

In parallel, there have been efforts to develop the network formation mechanisms that naturally gen- 
erate networks with the observed structural features. In these network formation mechanisms, there have 
been two relatively dichotomous modeling approaches. Broadly speaking, the theoretical computer science 
and physics community have mainly focused on relatively simple "mechanistic" but analytically tractable 



*A short version of this paper appeared in Proceedings of the Seventh Workshop on Algorithms and Models for the Web Graph 
(WAW'10) (19). 
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network models where connectivity patterns observed in the real-world naturally emerge from the model. 
The prime example in this line of work is the Preferential Attachment model with its many variants H [T] 
UlCEl El, which specifies a simple but very natural edge creation mechanism that in the limit leads to net- 
works with power-law degree distributions. Other models of similar flavor include the Copying Model ll23l . 
the Small-world model H3ll22l . Geometric Random Graphs ifTTl . the Forest Fire model [28], the Random 
surfer model 0, and models of bipartite affiliation networks ll24l . On the other hand, in statistics, ma- 
chine learning and traditional social network analysis, a different approach to modeling network data has 
emerged. There the effort is in the development of statistically sound models that consider the structure of 
the network as well as the features (e.g., age, gender) of nodes and edges in the network. Examples of such 
models include the Exponential Random Graphs 021, the Stochastic Block Model 0, and the Latent Space 
Model 1151 . 

"Mechanistic" and "Statistical" models. Generally, there has been some gap between the above two 
lines of research. The "mechanistic" models are analytically tractable in a sense that one can mathemati- 
cally analyze properties of the networks that arise from the models. These models emphasize the natural 
emergence of networks that have certain structural properties found in real-world networks. However, such 
models are usually not statistically interesting in a sense that they do not nicely lend themselves to model 
parameter estimation and are generally too simplistic to model heterogeneities between individual nodes. 

On the contrary, "statistical" models are generally analytically intractable and the network properties 
do not naturally emerge from the model in general. However, these models are usually accompanied by 
statistical procedures for model parameter estimation and very useful for testing various hypotheses about 
the interaction of connectivity patterns and the properties of nodes and edges. 

Although models of network structure and formation are seldom both analytically tractable and statis- 
tically interesting, an example of a model satisfying both features is the Kronecker graphs model Il26ll44l . 
which is based on the recursive tensor product of small graph adjacency matrices. The Kronecker graphs 
model is analytically tractable in a sense that one can analyze global structural properties of networks that 
emerge from the model Il32ll25l l6l. In addition, this model is statistically meaningful because there exists 
an efficient parameter estimation technique based on maximum likelihood Il27ll20l . It has been empirically 
shown that with only four parameters Kronecker graphs quite accurately model the global structural proper- 
ties of real-world networks such as degree distributions, edge clustering, diameter and spectral properties of 
the graph adjacency matrices. 

Modeling networks with rich node attribute information. Network models investigate edge creation 
mechanisms, but generally a rich set of attributes is associated with each node. This is especially true in 
social networks, where not only people's connections but also their characteristics, like age, gender, work 
place, habits, etc., have been collected. Similarly, various types of profile information is provided by users 
in online social networks. In this sense, both node characteristics and the network structure need to be 
considered simultaneously. 

The attempt to model the interaction between the network structure and node attributes raises a wide 
range of questions. For instance, how do we account for the heterogeneity in the population of the nodes or 
how do we combine node features in an interesting way to obtain probabilities of individual links? While 
the earlier work on a general class of latent space models formulated such questions, most resulting models 
were either analytically tractable but statistically uninteresting or statistically very powerful but do not lend 
themselves to mathematical analysis. 

To bridge this gap, we propose a class of stochastic network models that we refer to as Multiplicative 
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Attribute Graphs (MAG). The model naturally captures the interactions between the network structure and 
the node attributes in a clean and tractable manner. We consider a model where each node has a vector 
of categorical attributes associated with it. Individual attributes of nodes are then combined in order to 
model the emergence of links. The model allows for rich interaction between node features in a sense that 
one can simultaneously model features that reflect homophily (i.e., love of the same) as well as heterophily 
(i.e. , love of the different). For example, if people share certain features like hobby, they are more likely to be 
friends. However, for some other features like gender, people may be more likely to form a relationship with 
someone with the opposite characteristic. The proposed MAG model is designed to capture both homophily 
and heterophily that naturally occur in social networks. 

We proceed by formulating the model and show that it is both analytically tractable and statistically 
interesting. In the following sections, we present our mathematical results. Section [3] examines the number 
of edges and shows that our model naturally obeys the Densification Power Law |[28l . Section [4] examines 
the connectivity of MAG model, which includes the conditions not only when the network contains a giant 
connected component but also when it becomes connected. Section [5] shows that the diameter of the MAG 
model remains small even though the number of nodes is large. Section [6] shows that networks emerg- 
ing from the MAG model have a log-normal degree distribution. Furthermore, Section |7] describes a more 
general version of the model that can also capture the power-law degree distribution. We view this as partic- 
ularly interesting in the light of a long-standing debate about how to distinguish the power-law distribution 
from the log-normal distribution in empirical data ll37l l38l and what implications this would make for real- 
world networks. Also, our results imply that the MAG model model is flexible in a sense that networks with 
very different properties emerge depending on the parameter configuration. Finally, Section [8] verifies the 
properties of the MAG model by simulation experiments. The results of the simulations examine how the 
synthetic network changes depending the parameters as well as how similar the network looks to real-world 
networks. 

2 Formulating of the Multiplicative Attribute Graph (MAG) model 

In this section, we begin with the introduction of the Multiplicative Attribute Graph (MAG) model. Then, we 
formulate the general version of MAG model and present the simplified version that we analyze throughout 
this paper. Finally, we investigate the connection to some related works. 

2.1 General considerations 

We consider a setting where each node u has a vector a(u) of I categorical (e.g., binary) attributes associated 
with it. For simple examples, one can think of such attribute vectors as a sequence of answers to I yes/no 
questions such as "Are you female?", "Do you like ice cream?", and so on. 

The other essential ingredient of our model is to specify a mechanism that generates the probability of 
an edge between two nodes based on their attribute vectors. As mentioned before, we aim to be able to 
account for the homophily of certain features as well as the heterophily of the others by our model. For this 
mechanism, we associate each attribute i (i.e., 2-th question) with an attribute-attribute affinity matrix Gj. 
Each entry of matrix 6j represents the affinity depending on the values of the z-th attribute between a pair of 
nodes. More precisely, &i[z±, Z2] indicates the affinity between a pair of nodes, each of which respectively 
takes value z\ and Z2 for its z-th attribute. For the binary attribute example in Figure [T] each j is a 2 x 2 
matrix. To obtain the affinity corresponding to the i-th attribute between node u and v, the values of z-th 
attribute of both nodes select an appropriate cell of Gj. 
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Figure 1: Schematic representation of the Multiplicative Attribute Graphs (MAG) model. Given a pair 
of nodes u and v with the corresponding binary attribute vectors a(u) and a(v), the probability of edge 
P[u,v] is the product over the entries of attribute-attribute affinity matrices Bj where values of Oj(tt) and 
aj(f) "select" the appropriate entries (row/column) of 6j. Note that this visualized model represents the 
undirected graph by make each 6; symmetric. However, the MAG model in general represents the directed 
graph. 
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Figure 2: Structures in which a node attribute can affect link affinity. The widths of arrows correspond to 
the affinities towards link formation. 



By these affinity matrices, we can capture the various types of structure in real-world social networks. 
For example, Figure [2] shows four possible linking affinities of a binary attribute. Top figure of each case 
visualizes the general structure of networks. Each circle represents the group which shares the attribute 
value and the width of each arrow indicates the affinity of the link formation in the given direction (e.g., the 
arrow — > 1 indicates the affinity of link formation between a node with "0"-value of a given attribute and 
a node with "l"-value of that attribute.). Then, under each figure, we represent the structure in the form of 
the affinity matrix. 

To investigate one by one, Figure |2£a) shows the homophily (love of the same) attribute affinity and the 
corresponding affinity matrix 0. Notice large values on the diagonal entries of 0, which means that link 
probability is high when nodes share the same attribute value. Top of the figure demonstrates that there 
will be many links between nodes that have the value of the attribute set to "0" and many links between 
nodes that have the value "1", but there will be few links between nodes where one has value "0" and the 
other "1". Similarly, Figure [2jb) shows the heterophily (love of the different) affinity, where nodes that 
do not share the value of the attribute are more likely to link, which gives rise to near-bipartite networks. 
Also, Figure 0c) shows the core-periphery affinity, where links are most likely to form between "0" nodes 
(i.e., members of the core) and least likely to form between "1" nodes (i.e., members of the periphery). 
Notice that links between the core and the periphery are more likely than the links between the nodes of the 
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periphery. Additionally, Figure |2fd) illustrates the uniformly random structure that the Erdos-Renyi random 
graph model generates. By assiging the same value into every entry in each affinity matrix, we can build the 
MAG model equivalent to the Erdos-Renyi random graphmodel. 

From these examples, we notice that the MAG model nicely provides the flexibility in the network 
structure via the affinity matrices. Although we presented the binary and undirected examples, the MAG 
model basically allows more complicated structure with larger cardinalities (e.g., 3 x 3 or 4 x 4) as well as 
asymmetric structure through asymmetric affinity matrices. 

2.2 The Multiplicative Attributes Graph (MAG) model 

Now we formulate a general version of the MAG model. To start with, let each node u have a vector of I 
categorical attributes and let each attribute have cardinality di for i = 1, 2, • • • I. We also have I matrices, 
6j G di x di for i = 1, 2, • • • I. Each entry of Gj is the affinity of a real value between and 1 Q. Then, the 
probability of an edge (u, v), P[u, v], is defined as the multiplication of affinities corresponding to individual 
attributes, i.e., 

i 

P[u,v]=Y[@ i [a i {u),a i {v)] (1) 
i=i 

where a, (u) denotes the value of z-th attribute of node u. Note that edges appear independently with proba- 
bility determined by node attributes and matrices 0j. Figure Q] illustrates the model. 

One can think of the MAG model in the following sense. In order to construct a social network, we ask 
each node u a series of multiple-choice questions and the attribute vector a(u) stores the answers fo these 
questions. The answers of nodes u and v on a question i select an entry of matrix 0j, i.e., u's answer selects 
a row and u's answer selects a column. One can thus think of matrices Gj's as the attribute-attribute affinity 
matrices. Assuming that the questions are appropriately chosen so that answers are independent of each 
other, the product over the entries of matrices 0j can be regarded as the probability of the edge between u 
and v. 

The choice of multiplicatively combining entries of 0j is very natural. In particular, the social network 
literature defines a concept of Blau-spaces 0411351 where socio-demographic attributes act as dimensions. 
Organizing force in Blau space is homophily as it has been argued that the flow of information between a 
pair of nodes decreases with the "distance" in the corresponding Blau space. In this way, small pockets of 
nodes appear and lead to the development of social niches for human activity and social organization. In 
this respect, multiplication is a natural way to combine node attribute data (i.e., the dimensions of the Blau 
space) so that even a single attribute can have profound impact on the linking structure (i.e., it creates a 
narrow social niche community). 

The proposed MAG model model is analytically tractable in a sense that we can formally analyze the 
properties of the model. Moreover, the MAG model is also statistically interesting as it can account for the 
heterogeneities in the node population and can be used to study the interaction between properties of nodes 
and their linking behavior. Moreover, one can pose many interesting statistical inference questions: Given 
attribute vectors of all nodes and the network structure, how can we estimate the values of matrices 6j? 
How can we infer the attributes of unobserved nodes? Or, given a network, how can we estimate both the 
node attributes and the matrices 0j? However, the focus of this paper is in mathematical analysis and we 
leave the questions of MAG model parameter estimation for the future work. 

'Note that there is no condition for Qi to be stochastic, we only require each entry of Qi to be on interval (0, 1). 
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2.3 Simplified version of the model 



all i. These three conditions imply that 8 



, i.e., 6[0,0] = a,9[0,l] = 9[1,0] = P, and 



Next we delineate a simplified version of the model that we will mathematically analyze in the further 
sections of the paper. First, while the general MAG model applies to directed networks, we consider the 
undirected version of the model by requiring each 0j to be symmetric. Second, we assume binary node 
attributes and thus affinity matrices 0j have 2 rows and 2 columns. Third, to further reduce the number 
of parameters, we also assume that the affinity matrices for all attributes are the same, i.e., 0j = for 

a p 
P 7 

0[1, 1] = 7 for < a, P, 7 < 1. Furthermore, all our results will hold for a > P > 7. The assumption 
a > p > 7 is natural since most large real-world networks have a common onion-like "core-periphery" 
structure l29l l30l l25l . Figure |2J C ) exhibits this structure. More precisely, the network is composed from 
denser and denser layers of edges as one moves towards the core of the network. Basically, a > P > 7 
means that more edges are likely to appear between nodes which share value on more attributes and 
these nodes form the core of the network. Since more edges appear between pairs of nodes with attribute 
combination "0-1" than between those with "1-1", there are more edges between the core and the periphery 
nodes (edges "0-1") than between the nodes of the periphery themselves (edges "1-1"). 

Last, we also assume a simple generative model of node attributes where each binary attribute vector 
is generated by I independently and identically distributed coin flips with bias fj,. That is, we use an Lid. 
Bernoulli distribution parameterized by n to model attribute vectors where the probability that the i-th 
attribute of a node u takes value is P {a,i{u) = 0) = fj, for i = 1, • • • , I and < \i < 1. 

Putting it all together, the MAG model M (n, l,fi,@) is fully specified by six parameters: n is the number 
of nodes, / is the number of attributes of each node, \i is the probability that an attribute takes a value of 1, 
and = [a P; P 7] specifies the attribute-attribute affinity matrix. 

We now study the properties of the random graphs that result from the M(n, I, //, 0) where every un- 
ordered pair of nodes (u, v) is independently connected with probability P[u, v] defined in Equation (Q~|). 
Since the probability of an edge exponentially decreases in /, the most interesting case occurs when I = 
p log n for some constant pH This result perfectly agrees that the effective number of dimensions which can 
represent online social networks is the order of log n 0. 



2.4 Connections to other models 

We note that our model belongs to a general class of latent space network models, where nodes have some 
discrete or continuous valued attributes and the probability of linking depends on the values of attribute 
of the two nodes. For example, the Latent Space Model fl8l assumes that nodes reside in d-dimensional 
Euclidean space and the probability of an edge depends on the Euclidean distance between the locations 
of the nodes. Similarly, in Random Dot Product Graphs B31 . the linking probability depends on the inner 
product between the vectors associated with node positions. Furthermore, recently introduced Multifractal 
Network Generator P9l can also be viewed as a special case of MAG model where the node attribute 
distribution and the affinity matrix are equal for every attribute. 

The MAG model generalizes the Kronecker graphs model ll25l in a subtle way. The Kronecker graphs 
model takes a small (usually 2x2) initiator matrix K and tensor-powers it I times to obtain a matrix G of 
size 2 l x 2 l , interpreted as the stochastic graph adjacency matrix. One can think of a Kronecker graph model 
as a special case of the MAG model. 

throughout the paper, log(-) indicates log 2 (-) unless explicitly specified as ln(-). 
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Proposition 2.1 A Kronecker graph G on 2 nodes with a 2 x 2 initiator matrix K is equivalent to the 
following MAG graph M: Let us number the nodes of M as 0, • • • , 2 l — 1. Le? the binary attribute vector of 
a node u of M be a binary representation of its node id, and let 0j = K. Then individual edge probabilities 
(u, v) of nodes in G match those in M, i.e., Pg[u, v] = Pm [ u , v]. 

The above observation is interesting for several reasons. First, all results obtained for Kronecker graphs 
naturally apply to a subclass of MAG graphs where the node's attribute values are the binary representation 
of its id. This means that in a Kronecker graph version of the MAG model each node has a unique com- 
bination of attribute values (i.e., each node has different node id) and all attribute value combinations are 
occupied (i.e., node ids range 0, . . . , 2 l — 1). 

Second, building on this correspondence between Kronecker and MAG graphs, we also note that the 
estimates of the Kronecker initiator matrix K nicely transfer to matrix of MAG model. For example, 
Kronecker initiator matrix K = [a = 0.98, /3 = 0.58, 7 = 0.05] accurately models the graph of the 
internet connectivity, while the global network structure of the Epinions online social network is captured 
by K = [a = 0.99, /3 = 0.53,7 = 0.13] ll27l . Thus, in the rest of the paper, we will consider the above 
values of K as the typical values that the matrix © would normally take. In this respect, the assumption of 
a > (3 > 7 naturally appears. 

In following sections, we analyze the properties of the MAG model. We focus mostly on the simplified 
version. Each section states the main theorem and gives the overview of the proof. We omit the full proofs 
in the main body of the paper and describe them in the Appendix. 

3 The Number of Edges 

In this section, we derive the expression for the expected number of edges in MAG model. Moreover, this 
formula can valdiate not only the assumption, / = p log n, but also a substantial social network property, 
namely the Densification Power Law. 

Theorem 3.1 For a MAG graph M(n, I, p, 0), the number of edges, m, satisfies 

E [m] = n(n ~ 1} {p 2 a + 2/x(l - p)P + (1 - p) 2 ^) 1 + n(pa + (1 - p) 7 ) 1 . 

The expression is divided into two diffrent terms. The first term indicates the number of edges between 
distinct nodes, whereas the second term means the number of self-edges. If we exclude self-edges, the 
number of edges would be therefore reduced to the first term. 

Before the actual analysis, we define some useful notations that will be used throughout this paper. 
First, let V be the set of nodes in the MAG graph M(n, I, p, 0). We refer to the weight of a node u as the 
number of 0's in its attribute vectors, and denote it as \u\ , i.e.,\u\ = Yli=i 1 { a i( u ) = 0} where 1 {•} is an 
indicator function. We additionally define Wj as a set which consists of all nodes with the same weight j, 
i.e., Wj = {u E V : \u\ = j} for j = 0, 1, • • • , I. Similarly, Sj denotes the set of nodes with weight which 
is greater than or equal to j, i.e., Sj = {u G V : \u\ > j}. By definition, Sj = U l i= jWi. 

To complete the proof of Theorem l3.ll using the definition of the simplified MAG model, we can derive 
the main lemmas as follows: 

Lemma 3.2 For distinct u, v G V, E [P [u, v] \u E Wi] = (pa + (1 - n)Pf + 0- ~ nh) 1 '* ■ 
Lemma 3.3 For u E V, E [deg(u)\u E W t ] = (n - 1) (pa + (1 - p)/3y (pp + (1 - phf^ + lot^ . 
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By using these lemmas, the outline of the proof for Theorem 13. II is as follows. Since the number of 
edges is half of the degree sum, all we need to do is to sum E [deg(u)] over the degree distribution. However, 
because E [deg{u)\ = E [deg{v)\ if the weights of u and v are the same, we can add up E [deg{u)\u G W\\ 
over the weight distribution, i.e., binomial distribution Bin(l, p). 

On the other hand, more significantly, Theorem l3.1l can result in two substantial features of MAG model. 
First, the assumption that / = plog n for a constant p can be validated by the next two corollaries. 

Corollary 3.3.1 m G o(n) with high probability as n -> oo, if ^ > — io g (^ Q; +2 M (i-p)^+(i- M )^-y) • 
Corollary 3.3.2 m G Q(n 2 ~°^) with high probability as n — > oo, if I G o(logn). 

Note that log [p 2 a + 2p(l — p)j3 + (1 — p) 2 ^/) < because both p and 7 are less than 1. Thus, in 
order for M(n,l, p,Q) to have a proper number of edges (e.g., more than n), I should be bounded by 
the order of log n. On the contrary, since most social networks are sparse, / G o(log n) case can be also 
reasonably excluded. In consequence, both Corollary 13.3.11 and Corollary |3.3.2| provide the upper and lower 
bounds of I for social networks. These bounds eventually support the assumption of I = plog n. 

Although we do not technically define any process of MAG graph evolution, we can interpret it in 
the folllowing way. When a new node joins the network, its behavior is governed by the node attribute 
distribution which is seemingly independent of the network structure. However, in a long term, since the 
number of attributes grows slowly as the number of nodes increases, the node attributes and the network 
structure are not independent. This phenomenon is somewhat aligned with the real world. When a new 
person enters the network, he or she seems to act independently of other people, but people eventually 
constitue a structured network in the large scale and their behaviors can be categorized into more classes as 
the network evolves. 

Second, under this assumption, the expected number of edges can be approximately restated as 

i n 2+plog(Va+2 M (l- At )/3+(l-M) 2 7) 

2 

We find that this fact agrees with the Densification Power Law [28 ], one of the properties of social networks, 
which indicates m(t) oc n(t) a for a > 1. For example, an instance of MAG model with p = 1, p = 0.5 
(Proposition 12- lb . would have the densification exponent a = log(|0|) where |0| denotes the sum of all 
entries in 0. 

The proofs are fully described in Appendix. 



4 Connectivity 

In the previous section, we observed that MAG model obeys the Densification Power Law. In this section, 
we mathematically investigate MAG model for another general property of social networks, the existence 
of a giant connected component. Furthermore, we also examine the situation where this giant component 
covers the entire network, i.e., the network is connected. 

We begin with the theorems that MAG graph has a giant component and further becomes connected. 

Theorem 4.1 (Giant Component) Only one connected component of size 0(n) exists in M (n, I, p, 0) with 
high probability as n — > 00 , if and only if 



(pa + (l-p)f3r(pP + (l-p) 1 ) 1 - 



p 1 
> - 
~ 2 



3 It indicates the probability 1 — o(l). 
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Theorem 4.2 (Connectedness) Let the connectedness criterion function of M(n,l, fi,@) be 



F C (M) 



Oa + (1 



when (1 - fi)P > \ 



V 



(A*/3 + (1 - m)7) 



1 p 



otherwise 




■1 



where v is a solution of 



2 (0, //). 



T/ien, M(n, Z, /i, 0) is connected with high probability as n — > oo, ifF c (M) > ^. In contrast, M(n, I, //, 0) is 



To show the above theorems, we first define the monotonicity property of MAG model. 
Theorem 4.3 (Monotonicity) For u,v G V, P [u, v\\u\ = i] < P [u, v\\u\ = j] if i < j. 

Theorem 14.31 ultimately demonstrates that a node of larger weight is more likely to be connected with 
other nodes. In other words, a node of large weight plays a "core" role in the network, whereas the node of 
small weight is regarded as "periphery". This feature of the MAG model has direct effects on the connect- 
edness as well as on the existence of a giant component. 

By the monotonicty property, the minimum degree is likely to be the degree of the minimum weight 
node. Therefore, the disconnectedness could be proved by showing that the expected degree of the minimum 
weight node is too small to be connected with any other node. Conversely, if this lowest degree is large 
enough, say r2(log n), then any subset of nodes would be connected with the other part of the graph. Thus, 
to show the connectedness, the degree of the minimum weight node should be necessarily inspected, using 
Lemma [331 

Note that the criterion in Theorem l4.2l is separated into two cases depending on /x, which tells whether or 
not the expected number of weight nodes, E [|Wo|], is greater than 1, because \Wj\ is a binomial random 
variable. If this expectation is larger than 1, then the minimum weight is likely to be close to 0, i.e., O(l). 
Otherwise, if E [|Wo|] < 1, the equation of v describes the ratio of the minimum weight to Z as n — > oo. 
Therefore, the condition for connectedness actually depends on the minimum weight node. In fact, the proof 
of Theorem I4.2l is accomplished by computing the expected degree of this minimum weight node and using 
some techniques introduced in P2l . 

Similar explanation works for the existence of a giant component. Instead of the minimum weight node, 
Theorem 14. 1 1 shows that the existence of 0(n) component relies on the degree of the median weight node. 
We intuitively understand this in the following way. We might throw away the lower half of nodes by 
degree. If the degree of the median weight node is large enough, then the half of the network is likely to 
be connected. The connectedness of this half network implies the existence of G (n) component, the size of 
which is at least l|. In the proof, we actually examine the degrees of nodes of three different weights: /iZ, 
fil + Z 1 / 6 , and fil + Z 2 / 3 . The existence of O(n) component is determined by the degrees of these nodes. 

However, the existence of 0(ra) component does not necessarily indicate that it is a unique giant com- 
ponent, since there might be another 0(ra) component. Therefore, to prove Theorem 14. 1 1 more strictly, the 
uniqueness of G(n) component has to follow the existence of it. We can prove the uniqueness by show- 
ing that if there are two connected subgraphs of size 0(n) then they are connected each other with high 
probability. 

The proofs of those three theorems are in Appendix. 
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5 Diameter 



Another property of social networks is that the diameter of the network remains small although the number 
of nodes grows large. We can show this property in MAG model by applying the similar idea as in 11321 . 

Theorem 5.1 If (p/3 + (1 — pY() p > \, then M(n, I, p, Q) has a constant diameter with high probabil- 
ity as n — > oo. 

This theorem does not specify the exact diameter, but, under the given condition, it guarantees the 
bounded diameter even though n — > oo by using the following lemmas: 

Lemma 5.2 If {pf3 + (1 — p)"f) p > \, for A = S\i has a constant diameter with high proba- 

bility as n — >■ oo. 

Lemma 5.3 If (p(3 + (1 — p)^) p > \, for A = jj^t^zja^ > a ^ nodes in V\S\i are directly connected to 
S\i with high probability as n — )• oo. 

By Lemma [531 we can conclude that the diameter of the entire graph is limited to (2+ diameter of S\\). 
Since by Lemma 15.21 the diameter of S\i is constant with high probability under the given condition, the 
actual diameter is also constant. 

The proofs are represented in Appendix. 



6 Degree Distribution 

In this section, we analyze the degree distribution of the simplified MAG model under some reasonable 
assumptions^ Depending on 0, MAG model produces graphs of various degree distributions. For instance, 
since the network becomes a sparse Erdos-Renyi random graph ifa«/3«7<l, the degree distribution 
will approximately follow the binomial distribution. For another extreme example, in case of a 1 and 
p, w 1, the network will be close to a complete graph, which represents a degree distribution different from 
a sparse Erdos-Renyi random graph. For this reason, we need to narrow down the conditions on p and G as 
follows. If p is close to or 1, then the graph becomes an Erdos-Renyi random graph with edge probability 
p = a (when p « 1) or 7 (when p « 0). Since the degree distribution of the Erdos-Renyi random graph is 
binomial, we will exclude these extreme cases of p. On the other hand, with regard to G, we assume that 
a reasonable configuration space for G would be where is between 1.6 and 3. For the previous 

Kronecker graph example, this ratio is actually about 2.44. Our approach for the condition on Q can be 
also supported by real examples in ll27l . This condition is crucial for us, since in the analysis we use that 

( ^+(1-^)7 ) § rows faster than the polynomial function of x. If ^g^jE^j ^ s c l° se to L we cannot make 
use of this fact. Assuming all these conditions on p and G, we result in the following theorem about the 
degree distribution. 

Theorem 6.1 In M(n, I, p, Q)that follows above assumptions, if 

\pa + (1 - p)PY (M/9 + (1 " Mh) 1 ""] P > \, 



4 We trivially exclude self-edges not only because computations become simple but also because other models usually do not 
include them. 



10 



then the tail of degree distribution, p^, follows a log-normal, specifically, 

N fin (n{p,p + (1 - /xh)0 + lfihiR+ ~ ^ lnR ^\ ^(1 - ^)(lni?) 2 



/or i? = gggrgjf as « -> oo. 

In other words, the degree distribution of MAG model approximately follows a quadratic relationship on 
log-log scale. This result is nice since some social networks follow the log-normal distribution. For instance, 
the degree distribution of Live Journal network looks more parabolic than linear on log-log scale OTTl . 

In brief, as the expected degree is an exponential function of the node weight by Lemma [331 the degree 
distribution is mainly affected by the distribution of node weights. Since the node weight follows a binomial 
distribution, it can be approximated to a normal distribution for sufficiently large I. Because the logarithmic 
value of the expected degree is linear in the node weight and this weight follows a binomial distribution, the 
log value of degree approximately follows a normal distribution for large I. This eventually indicates that 
the degree distribution roughly follows a log-normal. 



Note that there exists a condition, 



p 1 

> |> which is related to 



[( M Q + (l- / i)/3r( / u/3 + (l-^) 7 ) 1 
the existence of a giant component. First, this condition is perfectly acceptable because real-world networks 
have a giant component. Second, as we described in Sectional this condition ensures that the median degree 
is large enough. Equivalently, it also indicates that the degrees of a half of the nodes are large enough. If we 
refer to the tail of degree distribution as the degrees of nodes with degrees above the median degree, then 
we can show Theorem l6.ll 

The full proofs for this analysis are described in Appendix. 



7 Extensions: Power-Law Degree Distribution 

So far we have handled the simplified version of MAG model parameterized by only few variables. Even 
with these few parameters, many well-known properties of social networks can be reproduced. However, re- 
garding to the degree distribution, even though the log-normal is one of the distributions that social networks 
commonly follow, a lot of social networks also follow the power-law degree distribution lfl"5l . 

In this section, we show that the MAG model produces networks with the power-law degree distribution 
by releasing some constraints. We do not attempt to analyze it in a rigorous manner, but give the intuition 
by suggesting an example of configuration. We still hold the condition that every attribute is binary and 
independently sampled from Bernoulli distribution. However, in contrast to the simplified version, we allow 
each attribute to have a different Bernoulli parameter as well as a different attribute-attribute affinity matrix 
associated wit it. The formal definition of this model is as follows: 

l 

P{aj(u) = 0) = fij, P[u,v] = Y[<5>j [aj(u),aj(v)] . 

j'=i 

The number of parameters here is 41, which consist of /Zj's and @fs for j = 1, 2, ■ ■ ■ , I. For convenience, 
we denote this power-law version of MAG model as M(n, I, fx, 6) where jl = {/zi, • • • , m} and 6 = 
{01, • • • , 6;}. With these additional parameters, we are able to obtain the power law degree distribution as 
the following theorem describes. 

Theorem 7.1 For M(n,i, /2,6), if = ( j^^z^yf ) & f or S > °> then the degree distribution 
satisfies p^ oc k~ s ~2 as n — > oo. 
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In order to investigate the degree distribution of this model, the following two lemmas are essential. 
Lemma 7.2 The probability that a node u in M(n, I, p, ©) has an attribute vector a(u) is 

pj (M . ) l{a 1 H=0} (1 _ /1 . ) l{a 1 H=l} > 

i=l 

Lemma 7.3 The expected degree of node u in M(n,l, p,@) is 

i 

(n - 1) J] (jmou + (1 - A*i)A) 1{oi(u)=0} (lH0i + (1 " Mi) 7,) lK(u)=1} ■ 
i=l 

By Lemmas 17.21 and 1731 if the condition in Theorem 17. 1 1 holds, the probability that a node has the same 
attribute vector as node u is proportional to (-S)-th power of the expected degree of u. In addition, (— |)-th 
power comes from the Stirling approximation for large k. This roughly explains Theorem 17. II 
The proof is given in Appendix and the result is also verified by simulation in Figure [5] 

8 Simulation 

In the previous sections, we performed theoretical analysis of the MAG model. In this section, we use simu- 
lation experiments to further demonstrate the properties of networks that arise from the MAG model. First, 
we generated synthetic MAG graphs with varying parameter values to explore how the network properties 
change as a function of those parameters. We focus on the change of scalar network properties, like diameter 
and the fraction of nodes in the largest connected component of the graph, as a function of the model param- 
eter values. Second, we also ran simulations with fixed parameter configurations to check other properties 
of MAG model that we did not theoretically analyze. In this way, we are able to qualitatively compare our 
model to a real-world network. 

8.1 MAG model parameter space 

Here we focus on the simplified version of the MAG model and examine how various network properties 
vary as a function of parameter settings. We fix all but one parameter and vary the remaining parameter. We 
vary p, a, /, and n in M(n, I, p, 0), where a is the first entry of the affinity matrix = [a f3; f3 7] and / 
indicates a scalar factor of 0, i.e., = / • ©o for a constant ©o = [ao /?o! A) 7o]- 

Figure [3] depicts the number of edges, the fraction of nodes in the largest connected component, and the 
effective diameter (the 90th-percentile distance |[28l ) of the network as a function p, a, f, and n for fixed 
/ = 8. First, we notice that the growth of network in the number of edges is slower than exponential since 
the curves on the plot grow sub-linearly in Figure |3ja) with log scaled y-axis. Note that the network size 
is roughly proportional to n 2 iyp 2 a + 2/x(l — p)fi + (1 — p) 2 ^) 1 from Theorem 13.11 For example, by this 
formula, the network size is proportional to the l-th power of /, i.e., the eighth power of / in this case. As the 
expected number of edges is a polynomial function of each variable (p, a, f and n), this sublinear growth 
on the log scale agrees with our analysis. Furthermore, the larger the degree of the polynomial function for 
each variable is, the closer to the straight line the network size curve becomes. For instance, the network size 
grows by the polynomial function of degree 16 over p, whereas it grows by degree 2 over n. In Figure[3ta), 
we thus observe that the network size growth over p is even closer to the exponential curve than that over n. 
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Second, in Figure 0b), the size of the largest component shows a sharp thresholding behavior, which 
indicates a rapid emergence of the giant component. This is very similar to thresholding behaviors observed 
in other network models such as the Erdos-Renyi random graphs model lfT4ll . The vertical line in the middle 
of each figure represents the theoretical theshold for the unique giant connected component. As we analyzed, 
each network contains at least half size of giant connected component at its threshold. 

Last, while the previous two network properties monotonically change, in Figure |3jc) the effective 
diameter of the network increases quickly up to about the point where the giant connected component forms 
and then drops rapidly after that and approaches a constant value. This behavior is in accordance with 
empirical observations of the "gelling" point where the giant component forms and the diameter starts to 
decrease in the evolution of real- world networks 112811331 . 

Furthermore, we also performed simulations where we fix © and \i but simultaneously increase both n 
and / by maintaining their ratio constant. Figure [4] plots the change in each network metric (network size, 
fraction of the largest connected component, and effective diameter) as a function of the number of nodes n 
for different values of \i. Each plot effectively represents the evolution of the MAG network as the number 
of nodes grows over time. From the plots, we see that MAG model follows densification power law (DPL) 
and the shrinking diameter properties of real- world networks j28l . Depending on the choice of \i, one can 
control the rate of densification and the diameter. 

8.2 Degree Distributions 

In addition to the network size, connectivity, and diameter, we also examined the degree distributions of 
MAG graphs empricially. We already proved that the MAG model can give rise to networks that have either 
a log-normal or a power-law degree distribution depending on the model parameters. Here we generate the 
two versions of networks and compare their degree distributions. 

Figure [5] exhibits the degree distributions of the two types of MAG model. While Figure [2 a) plots the 
degree distributions of the simplified MAG model M(n, I, /i, G), Figure [5tb) shows those of the power- 
law MAG model M (n, I, ft, Q). For each case, the left plot represents the raw form of degree histogram, 
whereas the right curve plots the complementary cumulative distribution (CCDF), which nicely removes 
the noisy factor. In Figure [3a), both raw and CCDF versions of distribution look parabolic on the log- 
log scale, which verifies that M(n, I, /j,, G) has a log-normal degree distribution. On the other hand, in 
Figure0b), both plots exhibit the straight line on the same scale, which indicates that the degree distribution 
of M(n, I, ft, G) follows a power-law. All these experimental results agree with our analyses in Section [6] 
and Section [7] 

8.3 Comparison to Real-world Networks 

Also, we qualitatively compare the structural properties of a specific real-world network and the corre- 
sponding MAG model. This leads to interesting questions of how to find optimal MAG model parameters 
so that synthetic network resembles the given real-world network. The full resolution of these questions 
lies beyond the scope of the present paper; currently, we searched by brute force over (the relatively small 
number of) possible MAG parameter settings. We manually selected some parameter settings (for n,l, fi, G) 
to synthesize the simplified MAG model and obtained the properties of M(n, I, /j,, Q) to compare the MAG 
model with a real-world network. Our goal is not to claim that these particular parameter values are in any 
way "optimal" for the given real-world network but rather to show that many properties of the MAG model 
exhibit qualitatively similar behavior as real-world networks. 
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Number of nodes (n) Number of nodes (n) Number of nodes (n) 



(a) Network size (b) Largest connected component (c) Effective diameter 

Figure 3: Structural properties of a simplified MAG model M(n, I, fj,, 0) when we fix I and vary a single 
parameter one by one: /i, a, f, or n. As each parameter increases, in general, the synthetic network becomes 
denser so that a giant connected component emerges and the diameter decreases to approach a constant. 
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Figure 4: Structural properties of a simplified MAG graph as a function of the number of nodes n for 
different values of \i (we fix the affinity matrix G = [0.85 0.7; 0.7 0.15] and the ratio p = //log n = 0.596). 
Observe not only that the relationship between the number of edges and nodes obeys Densification Power 
Law but also that the diameter begins shrinking after the giant component is formed ll33l . 
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Degree Degree 
(b) Power-law MAG: M(n, I, ft, O). Power-law degree distribution. 

Figure 5: Degree distributions of simplified and power-law version of MAG graphs (see Section|7]). We plot 
both PDF and CCDF of the degree distribution. The simplified version in Figure (a) has parabolic shape on 
log-log scale, which is an indication of a log-normal degree distribution. In contrast, the power-law version 
in Figure (b) shows a straight line on the same scale, which demonstrates a power-law degree distribution. 
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For the real- world network, we use the Yahoo !-Flickr online social network on 10,240 nodes and 44,800 
edges. For the simplified MAG model M(n, I, n, 6), we used I = 8, n = 0.45, 6 = [0.85 0.30; 0.30 0.25] 
with the same number of nodes n = 10, 240. Figure a) and (b) illustrate the following properties of the 
real-world and the corresponding synthetic network of the simplified MAG model (in the same order of 
figures). 

• Degree distribution is a histogram of the number of edges of a node fi31l . 

• Singular values indicate the singular values of the adjacency matrix versus their rank ifToll . 

• Singular vector represents the distribution of components in the left singular vector associated with 
the largest singular value lfl2l . 

• Clustering coefficient represents the degree versus the average (local) clustering coefficient of nodes 
of a given degree P31 . 

• Triad participation indicates the number of triangles that a node is adjacent to. It measures the transi- 
tivity in networks BTL 

• Hop plot shows the number of reachable pairs of nodes as the number of hops. It sketches how quickly 
the network expands B0ll27l . 

Figure [6] reveals that the plots of properities of MAG model resemble those of Yahoo !-Flickr network. 
Notice qualitatively similar behavior of nearly all properties between Figure 0a) and (b). The only property 
where the simplified MAG model does not match the Yahoo !-Flickr network seems to be the clustering 
coefficient. As in real-world networks high degree nodes tend to have lower clustering, in the simplified 
MAG model the situation is reverse - higher degree nodes also tend to have higher clustering. This is 
due to the fact that for all attributes we use the same affinity matrix which represents only the core- 
perphery structure (a > j3 > 7). Thus, the simpified MAG model can only resemble the overall core- 
periphery shape of real- world networks. However, in the Yahoo !-Flickr network, we can also discover the 
local clustering effect of homophily and network community formation, which views the network in the 
opposite way compared to the global core-periphery structure. 

Hence, our hypothesis is that the local clustering of nodes would naturally emerge by mixing core- 
periphery affinity matrices (a > (3 > 7) and homophily affinity matrices (a, 7 > f$). To investigate this, we 
also generated the synthetic network with more general version of MAG model, M(n, I, fl, 0). Figure |6jc) 
illustrates the network properties of this general version. Note that this general version of the model nicely 
captures the heavy-tailed cluestering coefficient distribution that the real-world network shows while the 
simplified version cannot. For the other properties, the general version still exhibits distributions which 
qualatatively seem similar to those of the real-world network. 

By this experiment, we can find that MAG model is capable of representing real-world networks. Fur- 
thermore, we verify the flexibility of MAG model in a sense that it can give rise to networks with different 
network properties depending on the MAG model parameter configuration. 

9 Conclusion 

We presented the Multiplicative Attribute Graph model for real-world networks which considers the cate- 
gorical node attributes as well as the affinity of link formation depending on the values of node attributes. 
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(a) Yahoo !-Flickr network (b) Simplified MAG model (c) General MAG model 



Figure 6: The comparison of network properties between real- world Yahoo !-Flickr online social network, 
a simplified MAG model network, and a general version of MAG model. Except for clustering coefficient, 
the properties of MAG model qualitatively resemble those of the Yahoo !-Flickr network even when it is 
the simplified version in Figure (b). Moreover, the general version of the MAG model can represent all six 
network properties of similar shape to real-world networks in Figure (c). 
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We introduced the attribute-attribute affinity matrix to represent the affinity of link formation and provide 
the flexibility in the network structure. 

On the other hand, the MAG model is both analytically tractable and statistically interesting. In this 
paper, we analytically showed several network properties observed in real-world networks. We proved that 
the MAG model obeys the Densification Power Law. We also showed both the existence of unique giant 
connected component and a small diameter in the MAG model. Furthermore, we mathematically analyzed 
that the MAG model give rise to networks with either a log-normal or a power-law degree distribution. 
Finally, we emprically verified our analytical results. 

The MAG model is statistically interesting in a sense that it can represent various types of network 
structure as well as lead a problem that aims to find such structures of the given real-world networks in 
terms of the MAG model parameters. However, we leave the parameter fitting problem as a venue of the 
future work. Furthermore, future work includes other kinds of problems such as how to find underlying 
network structures and missing node attributes where node attributes are partially observed. 
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A Appendix: The Number of Edges 



Proof of Lemma l372t Let N® v be the number of attributes that take value in both u and v. For instance, if 
a{u) = [0010] and a(v ) = [0110], then N® v = 2. We similarly define N^ v as the number of attributes 
that take value 1 in both u and v. Then, N® v , N^ v > and N® v + N^ v < I as I indicates the number of 
attributes in each node. 

By definition of MAG model, the edge probability between u and v is 

n r i JV° nl — N° —N 1 N 1 

XyUi V\ — — Ct V jJ UV J UVry UV 

Since both N® v and N^ v are random variables, we need their conditional joint distribution to compute 
the expectation of the edge probability P[u, v] given the weight of node u. Note that N® v and N^ v are 
independent of each other if the weight of u is given. Let the weight of u be i, i.e., u G Wj. Since u and 
v can share value only for the attributes where u already takes value 0, N® v equivalently represents the 
number of heads in i coin flips with probabiltiy \i. Therefore, N® v follows Bin(i, p). Similarly, N^ v follows 
Bin(l — i, 1 — jj). Hence, their conditional joint probability is 
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Using this conditional probability, we can compute the expectation of P[u, v] given the weight of u: 
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Proof of Lemma 1373c By Lemma [3721 and the linearity of expectation, we sum this conditional probability 
over all nodes and result in the expectation of the degree given the weight of node u. ■ 



Proof of Theorem 13. It We compute the number of edges, E [m], by adding up the degrees of all nodes 
described in Lemma [331 



E [m] = E 



- degiu) 



1 1 

\ n E C) (( n - X ) ^ a + C 1 " Cmj9 + (1 " Mh) ,_J ' + 2^VV^(1 - /x)^ 
j"=o 

I^LZ±) ( M 2 a + 2/x(l - + (1 - m) 2 t)' + n faa + (1 - M ) 7 )< . 



Proof of Corollary 13.3. It Suppose that I = — j^-J log ra for £ = fi 2 a + 2/i(l — n)j3 + (1 — ^) 2 7 and 

e > 0. By Theorem 13- 1 L the expected number of edges is (ra 2 <^). Note that log£ < since £ < 1. 
Therefore, the expected number of edges is 



G(n 2 C Z ) = 9 (V + ^) = e(?i 1+elogf ) = o(n) . 



Proof of Corollary 13.3.2b Under the situation that I G o(log n), the expected number of edges is 



B Appendix: Connectivity 

Since Theorem l4.3l is used to prove other theorems, we begin with the proof of it. 

Proof of Theorem I4.3t If j > i, for any v E Wj, we can generate a node v^> E Wj from v by flipping 
(j — i) attribute values that originally take 1 in v. For example, if a(v) = [0110], then a(V 3) ) = [0010] 
or [0 1 0]. Hence, P[u, v®] > P[u, v] for v G Wi. 
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Here we note that E [P [u, v^] \v G Wj\ = E [P [u, u^] g W^] , because each can be gen- 
erated by RJ different a(t>) sets with the same probability. Therefore, 

E [P [u, u] |u G Wj] = E [E [P L |u € Wi] 1 > E [E [P [u, u] |v G W*]] = E [P [u, v] \v G Wi] . 

■ 

Next theorem plays a key role in proving Theorem 14. li as well as Theorem l4.2l 

Theorem B.l Let \Sj\ G 0(n) andE, [P [u, V\u] \u G Wj] > clog n as n -y oo for some j and sufficiently 
large c. Then, Sj is connected with high probability as n — > oo. 

Proof: Let 5" be a subset of Sj such that S" is neither an empty set nor Sj itself. Then, the expected number 
of edges between S' and Sj\S' is 

E [P [S^S^S 1 ] | \S'\ = k] = k ■ (\Sj\ - k)-E[P[u,v] \u, v G Sj] 

for distinct u and v. By Theorem l4.3[ 

E [P [u, v] \u, v G Sj] > E [P [it, u] |u G Sj, v £ V] 

> E[P [u,u] \u G G V\u] 

clogra 
~~ rt 

Given the size of 5' as fc, the probability that there exists no edge between S' and Sj\S' is at most 
exp (— |E [P [5", Sj^S"] 1 1 S' | = A;]) by Chernoff bound. Therefore, the probability that Sj is disconnected 
is bounded as follows: 

P(Sj is disconnected) < P(no edge between S', Sj\S') 

S'cSj,S'^<H,Sj 



£ e W (~E[P[S',Sj\S']\\S'\]) 
£ exp (V| (|5,| -M)^) 



< 

S'cS,S'^Q,S. 



l<fc<|Sj|/2 V 7 V 



<2 £ | Sjf exp(-«M 

l<fc<|5,-|/2 V 7 

<2 £ exp ((log 15,1-4^)*) 



l<fc<|5j|/2 

2 ^ exp ( -AG (log n)) ('.' \Sj\ G 0(n)) 

l<fc<|5j|/2 

/i \ k 

^ E 



l<fc<|5j|/2 
1 



J? 



8(1) 



Go(l) 
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as n — > oo. Therefore, Sj is connected with high probability. ■ 

Now we turn our attention to the giant connected component. To show its existence, we investigate S^, 
Belize, and S „i+p/3 depending on the situation. The following lemmas tell us the size of each subgraph. 



Lemma B.2 jS^/l > § — o(n) with high probability as n — > oo. 

Proof: By Central Limit Theorem, ~ N(0, 1) as n — > oo, i.e. 

at least \ — o(l) so jS^/l > § — o(n) with high probability as n — > oo. 



oo. Therefore, P(\u\ > pi) is 



Lemma B.3 |i\^ + ;i/6 | G 0(n) wif/z /i/g/i probability as n — > oo. 
Proof: By Central Limit Theorem mentioned in Lemma |BT2l 



< \u\ < pl + l 1/e ) » $( 



jl/6 



y/lfl(l-fl)' 



$(o) e o(i) 



as / — >• oo where $(z) represents the cdf of the standard normal distribution. 

Since P{\u\ > pi + Z 1 / 6 ) is still at least \ — o(l), the size of S^^i/e is G(n) with high probability as 
/ — > oo, /.e, n — > oo. ■ 



Lemma B.4 \S„i +i 2/3 | G o(n) w/f/i /n'g/i probability as n — » oo. 

Proof: By Chernoff bound, P(|it| > pl+l 2 / 3 ) is o(l) as / — >• oo, thus |5 „j + p/3 | is o(n) with high probability 



as n — )■ oo. 



Using the above lemmas, we show the existence and the uniqueness of the giant connected component under 
the given condition. 



Proof of Theorem SZD 



(Existence). First, if 



E[P [u,V\u] \u£W t 



(pa + (1 - p)PY (p/3 + (1 - p)~i) 1 H > i, then by Lemma 



(pa+(l-p)f3r(pf3 + (l-ph) 



1-fZ 



log n 



l + e) logn > clogn 



for some constant e > and c > 0. Since \S„i\ £ @(n) by Lemma [BT2l S^i is connected with high 
probability as n — > oo by Theorem lB.il In other words, we are able to extract out a connected component 



of size at least ^ — o(n). 



Second, when 



(pa + (l-p)pr(pf3 + (l-p) 7 ) 



i, we can apply the same argument for 
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Sut+l 1 / 6 - Because \S l+l x/ 6 \ G Q(n) by Lemma |B31 



E 



P[u,V\u]\ue W^i/e 
: ( /Ua + (i_ M ) / 3)^(^ + (i-^) 7 ) 1 -^ 

(p logn) 1 / 6 



gn ^«+(l-/u)/? ^ (pl ° gra)1/6 

/x/3 + (1 - /x) 7 



/ia + (l-^)/3 \ 
liP + (l-fi)>yJ 

= (1 + g'^plogn 1 / 6 

which is also greater than clogn as ra — >• oo for some constant e' > 0. Thus, S^ +l i/6 is connected with 
high probability by Theorem lB.il 



Last, on the contrary, when 



(jjta + (1 - / u)/3) M + (1 - A*h) " < i for 



E 



P [u, V>] |u G W^+ja/a^ 
2[( Aia + (l- A t)^ (^ + (l_ M ) 7 )i-M 

h _ f /Ap- 2/3 (i°g«) 1/3 f m« + (i 

1 ^ U/3 + (1-/^)7/ 



lo S n ( fJLOL + (1 - ^)/3 

///3 + (1 - /i) 7 



V+z 2/3 ' 



(p log n) 2 ' 3 



(plog n) 2 / 3 



is o(l) as n — > oo for some constant e" > 0. Therefore, by Theorem 14. 3 1 the expected degree of a node with 
weight less than yd + 1 2 / 3 is o(l). However, since S i+ p/3 is o(n) by Lemma IBT41 n — o(n) nodes have less 
than yd + Z 2 / 3 weights. Hence, most of n — o(n) nodes are isolated so that the size of the largest component 
cannot be 0(n). 



(Uniqueness). We already pointed out that either S^i or S'uj+ji/e is the subset of 0(n) component when 
the giant connected component exists. Let this component be H. Without loss of generality, suppose that 
Spi C H. Then, for any fixed node u, 

P[u,H]>P[u,S^] (■■S td cH) 
= |5 MZ |-E[P [u,v]\v £ S^] 

> \S,a\ • E [P [u, v] \v G T/\5^] (By Theorem[43]) 



I'S'p/l 



n - \S^i 

Since V\H C V\S^, 



■P[u,V\S,a] 



E [P [u, V\H]} <E[P [u, V\S^}] < ( "ig^' 1 ) E \ p [«. ^11 



holds for every u G V. 



26 



Suppose that another connected component H' also contains O(n) nodes. We will show the condtradic- 
tion if H and H' are not connected with high probability as n — > oo. To see E [P [H, H']], 



E [P [H,H r \] = \H'\ • E [P [u, H] \u G H'] 
n ~ Infill 

> ^nfr E [P [«, fl 7 ] | u G '] (v iT' C K\£T C F\5^) . 
n ~~ Infill 

However, E [P [it, i?'] \u G P 7 ] G Otherwise, since the probability that it G H' is connected to 

i7' is not greater than E [P [u, H'] \u G H'\ by Markov Inequality, u is disconnected from H' with high 
probability as n — > oo. H' thus includes at least one isolated node with high probability as n — > oo. This is 
contradiction to the connectedness of H'. 

On the other hand, if E [P [u, H'] \u G H'\ G 0(1), then E [P [H, H']] G Sl(n). In this case, by Chernoff 
bound, H and H' are connected with high probability as n — > oo. This is also contradiction. Therefore, 
there is no 6(n) connected component other than H with high probability as n — > oo. ■ 

Next, the proofs for the connectedness follow. Before the main proof, we present some necessary lemmas 
and prove them. 

Lemma B.5 (^:) X \ T^x) * s a mon °tonically increasing function of x over (0, ji). 
Proof: Let f(x) be the log-value of the given function, i.e., 

f(x) = x (log fJ, — log x) + (1 — x) (log (1 — fx) — log (1 — x)) . 

To take the derivative of f(x), 

f'(x) = (log// - logx) + (log(l -x) - log(l - /t)) . 

Since x < \i and 1 — fi < 1 — x, f'(x) > where < x < \i. This implies that fix) is strictly increasing, 
so the given function is also strictly increasing over (0, /t). ■ 



Lemma B.6 If(l — /t) p > \, then — > with high probability as n — > oo. Otherwise, if (I — pS) p < \, 
— \ v with high probability as n — >■ oo where v is a solution of the equation {^) V (l^y = 3 

in (0, /t). 

Proof: First, we assume that (1 — /j) p > |, which indicates n(l — n) p > 1 by defition. Then, the probability 
that \Wi\ = is at most exp(— ^E [| Wi\]) by Chernoff bound. However, for fixed /t, 



E[|^i|]=nQ/ i 1 (l-//) / - 1 > T ^-/G 



o(0- 



Therefore, by Chernoff bound, P(|Wi| = 0) — > as I — > oo. This implies that V m - m is o(l) with high 
probability as n — > oo. 
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Second, we look at the case that (1 — fi) p < |. For any e € (0, fi — i/), to use Stirling's approximation, 



! \l 



V 2n(v + e )|((^)(^)V2vr(l - {y + e)) f 



(1-M) 



(l-(W-e))/ 



V2ttZ(i/ + e) (1 - (y + e)) 



/' 



l-/i 



l-(f + e 



(i-("-M))i 



!-(!/+«)' 



Since 



1-ac 



is a increasing function of x over (0, /i) by Lemma |B31 



1 - fi/ + e 



d + I 5 



1/P 



(l + e')n 



for some constant e' > 0. Therefore, 



E [|W (l/+e)I |] 



^2vr/(i/ + e) (1 - (i/ + e)) 



exponentially increases as / increases. By Chernoff bound, IW^+^I is not zero with high probability as 
I — > oo, i.e., n — y oo. 



In a similar way, E [|W(i/— 



exponentially decreases as I increases. Since 



^2nl(u-e)(l-(u-e)) 

E [\Wi\] > E [\Wj\] if /xZ > i > j, the expected number of nodes with at most weight (y — e)l is less than 
[y — e)l E [|WV^_ e )i|] and its value goes to zero as I — > oo. Hence, by Chernoff bound, there exists no node 
of the weight less than (y — e)l with high probability as n — > oo. 

To sum up, ^jin goes to v with high probability as Z — > oo, i.e., n — > oo. ■ 

Using the above lemmas, we show the condition that the network is connected. 

Proof of Theorem |4^2~t Let ^nia ->■ t for a constant t £ [0, fi) as n — > oo. 



If 



(MQ + (l-M)/3)*(M/3 + (l-M)7) 



> i, by Lemma [331 



E [P [u, V\u] \u G W Vn ,J «E[P [u, V>] I" G Wii] 



(/iQ + (1 - ^)/3) 4 + (1 - m) 7 ) 



l-t 



log 71 



= (l + e) lo s n 
> clogn 

for some e > and sufficiently large c. Note that SV min indicates the entire network by definition of V m \ n . 
Since |SV min | is @(n), Sy^ is connected with high probability as n — > oo by Theorem IB. II Equivalently, 
the entire network is also connected with high probability n — > oo. 
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v mm ■ min -i 

On the other hand, when (fia + (1 — /u)/3) lo s n [fj,/3 + (1 — /UJ7) lo s™ < 2, the expected degree of a 
node with \V m - m \ weight is o(l) because from the above relationship E [P [u, V\u] \u S Wy min ] ~ (1 — 
e') logn for some e' > 0. Thus, in this case, some node in W y m . n is isolated with high probability so the 
network is disconnected. ■ 



C Appendix: Diameter 

Theorem C.l ^2 1271/ For an Erdos-Renyi random graph G(n,p), if (pn) d ~ 1 /n — > and (pn) d /n — > 00 
for a fixed integer d, then G(n, p) has diameter d with probability approaching 1 as n — > 00. 

Proof of Lemma |53t Let A G and A H be the probabilistic adjacency matrix of random graphs G and H, 
respectively If Af- > Af- for every i, j and H has a constant diameter with high probability, then so does 
G. It can be understood in the following way To generate a network with A G , we first generate edges with 
A H and further create edges with {A G — A ). However, as the edges created in the first step already result 
in the constant diameter with high probability, G has a constant diameter. 

Note that min Uj „ e s A; P[u, v] > p^ 1 ^ 1- ^ 1 . Thus, it is sufficient to prove that the Erdos-Renyi random 
graph G(\S\i\, f3 xlr y ( - 1 ~ x ^ 1 ) has a constant diameter with high probability as n — > 00. However, 

E [\W X i\] /3 AZ 7 (1 " A)/ = n['] /'(I - M )(i-^/^ 7 (i-A)* 



y/2irl\ (1 


-A) 


n 




y/2TTlX (1 


-A) 


1 




y/2TTlX (1 


-A) 


1 





(By Stirling approximation) 



t/2ttZA(1 - A) 



1-A 

(2( M /3+(i- / ,) 7 n losn 

( 1 + e) logn 



for some e > 0. 

Since this value goes to infinity as n — > 00, so does E [W\j]. Therefore, by Chernoff bound, \W\i\ > 
cE [W\i] with high probability as n — > 00 for some constant c. Then, 

|<S A /|/3 Ai 7 (1 - A)/ > \Wxi\P Xl l {1 ~ X)l 

>cE[|W Ai |]/3 A S (1 " A) ' 

« ; C (1 + e) logn • 

^2vrZA(l-A) 



By Theorem IC.ll an Erdos-Renyi random graph G(\S\i\, — c ^ 1 , +<; ^ g ) has a diameter of at most 

\S\i\ \J 27riA(l— A) 

(l + ^) with high probability as n -)■ 00. Thus, the diamters of G(|5a;|, /3 A/ 7^~ A ^) as well as 5az are 
also bounded by a constant with high probability as n — > 00. ■ 



29 



Proof of Lemma [573t For any u G V, 



p[ U ,s xl ]> Y, n C)> 

j=Xl 

= (2 + (1 - M)7) P ) l0Sn f £ Q A ^ " A )'" J 

By Centeral Limit Theorem, Sj=a/ ~~ A) i_J converges to | as I — > oo. Therefore, P[u, S\i] is 

greater than c log ?t, for a constant c, and then, by Chernoff bound, u is directly connected to S\i with high 
probability as n — > oo. ■ 

D Appendix: Degree Distribution 

Theorem D.l M3/ P (deg(u) = k) = J u£V ( n ~ 1 ) (E [P [u, v]}) k (1 - E [P [u, v]]) n ~ 1 ~ k du . 
Corollary D.l.l For Ej = (//a + (1 - /x)/5) j (/i/3 + (1 - /ih) /_i , 

Proof: To reformulate Theorem ID. 1[ 

P (deg(u) = k) = Y^ P(u G Wj) ( H ~ ^ (E [P [u, v] \u E Wj)) k (1 - E [P [u, v] \u G 
i=o ^ ' 

Therefore, by applying Lemma [3^21 we obtain the desired formula. 

Proof of Theorem l6.lt To reduce the space, we begin by defining some notations as follow: 

x = fia + (1 — 
y = fi/3 + (1 - /i)7 

fi(k) = ( n ~ k 1 ) (x j y M ) k (i - * 3 y l -^ n 



the probability of degree k in M(n, I, /i, 6) is p k = - (V)- 6 ? (1 - #j') n ~ 



n—l—k 



g j (k)=Q^(l-^f j (k). 



By Corollary ID. 1.11 we can restate pt as /^, = o5j(&)- 

If most of those terms turn out to be insignificant under our assumptions, the probability p k can be 
approximately proportional to one or few dominant terms. In this case, what we need to do is thus to seek 
for j that maximizes gj(k) = — fi) l ~ifj(k) and find its approximate formula. 
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We start with the approximation of fj(k). For large n and k, by Stirling approximation, 



n—k 



fj(k) 



V2^(n/e) n (xiy l -i) k (l - x j y l ~ j ) 
^/2^k{k/e) k ^2^{n - k) ((n - k)/e) n ~ k 

nx^y l ~^ \ k ( 1 — x 3 y 1 ^ 3 N r 



2nk (1 - |) 



k 



1 - k/n 



However, the expected degree of maximum weight node is 0(n (fia + (1 — /j,)(3) 1 ), so is the expected 
maximum degree, k is thus o(n) with high probability as n — > oo, i.e., I — > oo. 



1 — x^y 1 i 
1 - A/n 



n—k 



exp ( — (n — k)x-*y J ' + (n — k)k/n) ~ exp(— nx^y 1 J + A) 



J, ,1-3 



For sufficiently large /, we can further simplify g 3 {k) by normal approximation of the binomial distribu- 



tion: 



\n 9j {k) = In " + W#) 



i In (2^(1 - M)) - 2Z ( i_ ) C7 " /^O 2 + Wj(*0 



(7 



1 



1 



A 



n"(7 — mO 2 In A; — A; In 

2Z//(1 - m) 2 nxiy l -i 



- + A 1 



for some constant C. When A = nx T y T for r > ^/ and = ~ , 

m^-(A) « C - — i— - (j - M Z) 2 _ I In A + A(j - r) lni? + A (1 - . 
Using (j - /i/) 2 = (j - r) 2 + (t - /i/) 2 + 2(j - t)(t - /ii). 

ln 5i (A) « C T - ^ ~ T ^ + (j - r) ffclnfl- + fc ( X ~ Ri ~ T ) ~ \ lnk 



2lfj,(l-fj.) 



in(i-n) 



for C T -C • 

Considering ^ (A) as a function of j, not A, now we find j that maximizes gj (A) for A = nx T y l ~ T . How- 
ever, the median weight is approximately equal to \il by Central Limit Theorem. If we focus on the higher 



half degrees, we can thus let r > In this case, since 



^a+il-vW (jip + (l-fJ.h) 



l-fi 



A > 



fa + (i _ ripf + (i _ ^) 7 ) 1 "H e n(0 . 



If we differentiate In gj(k) over j, 
(ln gj (k)) f - 



+ A In R 



T — fll 

lp(l - n) 



kW~ T lni? = 0. 



Because A G and j, r G O(Z), we can conclude that R? T pa 1 as n — > oo; otherwise, | (In gj(k))' 
grows as large as J7(A). Therefore, when j pa r, gj(k) is maximized. 
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Furthermore, since I 2^(1-^) I ^ khiR as n — > 00, the first quadratic term 2 %{i-^) * n m 5j(k) * s 
negligible. As a result, when/? is practical (close to 1.6 ~ 3), mg T+ A would be at most (0(— fe| A|) — ln^ T ) 
for A > 1. After all, g T effectively dominates the probability p k , i.e., lnp k is roughly proportional to In <? T . 
By assigning r = lnh ^ ny , we obtain 

1 /lnfc-mny' \ 2 1 



C- — ^- -7T f mfc-lnny' -IfilnR- -lu(l - ^(InR) 2 ] -In 

2//i(l -^)(lni?) 2 V 2 y 



fc. 



for some constant C". Therefore, the degree distribution approximately follows the log-normal as de- 
scribed in Theorem 16. II ■ 



E Appendix: Power-law Distribution 



Proof of Lemma I7.2t Since a^'s are independently distributed Bernoulli random variables, Lemma 17721 
holds. ■ 

Proof of Lemma 17.3b Let's define Pj(u, v) as the edge probability between u and v when considering only 
up to the j-th attribute, i.e., 

3 

Pj(u,v) =Y\_&i[ai(u),ai(v)] . 

i=i 

Thus, what we aim to show is that for a node v, 

1 

E [Pi(u,v)] = J] [fHOi + (1 - Mi )A) 1{ai(u)= ° } + (1 " Mi)7i) 1{ai( " )=1} • 
i=i 

When / = 1, it is trivially true by Lemma [3721 When / > 1, suppose that the above formula holds for 

I = 1, 2, • • • , fc. Since u) = u)6fc +1 [a k+1 (u) , a k+ i(v)], 

E[P k+1 (u,v)] 

= E [Pfe(it, v)] E [©jfc +1 [ofc+i(it), a fc+ i(u)]] 

= E [P k (u,v)} (fx k+1 a k+1 + (1 - /i fc+1 )/3 fc+1 ) 1 i^+iW=°>(^ +1 /3 fc+1 + (1 - / i fc+1 ) 7fc+1 ) 1 ^+i^= 1 > 
fe+i 

= [J ^ + C 1 - Mi)ft) 1{a<W= ° } + (1 - ^)7i) 1{ai( " )=1} • 

i=l 

Therefore, the expected degree formula described in Lemma 17731 holds for every I > 1. ■ 

Proof of Theorem I7.lt Before the main argument, we need to define the ordered probability mass of 
attribute vectors as p(j\ for j = 1,2, ••• ,2. For example, if the probability of each attribute vector 
(00, 01, 10, 11) is respectively 0.2, 0.3, 0.4, and 0.1 when I = 2, the ordered probability mass is p^ = 0.1, 
P{2) = 0.2, p( 3 ) = 0.3, andp( 4 ) = 0.4. 
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Then, by Theorem ID. 1[ we can express the probability of degree k, pk, as follows: 



Pk 



i 

n — 1 



jE^^a-^r 1 -" (2) 



k 

3=1 



where Ej denotes the average edge probability of the node which has the attribute vector corresponding to 
p^y If P(j)'s and Ej's are configured so that few terms dominate the probability, we may approximate pu 
as ( n ^ 1 )p^(E T ) k (l - £; T ) n - 1 - fc for r = axg max, py) (Ej) k (1 - Ej) n ~ l ~ k . Assuming that this approx- 
imation holds, we will propose a sufficient condition for the power-law degree distribution and suggest an 
example for this condition. 

To simplify computations, we propose a condtion that p^ oc EJ 6 for a constant 5. Then, the j-th term 

is 



^ k ^pyj { Ej ) k (1 - Ej)^ OC (1 - e^- 1 ^ 



which is maximized when Ej k, zJzjzzx - Moreover, under this condition, if Ej + \/Ej is at least (1 + z) for 
a constant z > 0, then 

P(t+A) 



(E T+A ) k (1 - ^a)"- 1 -" 



P(T) (E T ) k (1 - Br)"- 1 -" 

is o(l) for A > 1 as n — )• oo. Therefore, the r-th term dominates the Equation ©. 
Next, by the Stirling approximation with above conditions, 



Pk 



n — 1\ I k — o \ In — 1 — k 



k J \n — 1 — 5 J \n — 1 — 5 
, 1 {>~ T sf(n-l)(k-S)\ k f n 



oc 



^Jk(n -l-k) \ k(n-l-5) 

k 



n 
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for sufficiently large and n. Thus, p^. is approximately proportional to k~2~ s for large k as n — > oo. 

Last, we prove that the two conditions for the power-law degree distribution are simultaneously feasible 
by providing an example configuration. 



If every p^) is distinct and yf^j- = J , then we satisfy the condition that py) oc (-Ej)' 

by Lemma IT!2l and Lemma 1731 On the other hand, if we set j^-j = (1 + z)~ 2 5 , then the other condition, 
Ej+i/Ej > (l+z) is also satisfied. Since we are free to configure //j's and 8j's independently, the sufficient 
condition for the power law degree distribution is feasible. ■ 
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